-
Notifications
You must be signed in to change notification settings - Fork 0
Automated Test: monitor-incident-refactor-after #311
Automated Test: monitor-incident-refactor-after #311
Conversation
…0528) Since we'll be doing more with issue occurrences split out the concept of incidents into it's own logic module, as well as incident_occurrence into it's own module Part of GH-80527
📝 WalkthroughWalkthroughThe changes introduce incident occurrence publishing for monitor failures. A new module creates and emits incident occurrences with failure metadata and evidence, while a new function evaluates monitor environments for incident state transitions based on failure thresholds. The existing mark_failed function is refactored to delegate to this new incident threshold logic. Changes
Sequence DiagramsequenceDiagram
participant Check as Failed Check-in
participant MarkFailed as mark_failed()
participant Threshold as try_incident_threshold()
participant Store as Monitor Environment<br/>(Database)
participant Incident as Incident<br/>(Database)
participant Occurrence as create_incident_occurrence()
participant Kafka as Kafka
Check->>MarkFailed: failed_checkin, ts, received
MarkFailed->>Threshold: failed_checkin, failure_issue_threshold, received
Threshold->>Store: Get monitor_environment
alt monitor_env exists and status is OK/ACTIVE
Threshold->>Store: Fetch previous check-ins up to threshold
Threshold->>Store: Set status to ERROR, save
Threshold->>Incident: get_or_create incident
Incident-->>Threshold: incident instance
else status is ERROR
Threshold->>Incident: Retrieve active incident
end
alt Incident exists and not muted
Threshold->>Occurrence: create_incident_occurrence(failed_checkins, checkin, incident, received)
Occurrence->>Kafka: produce_occurrence_to_kafka(IssueOccurrence)
Kafka-->>Occurrence: ✓ Published
end
Threshold-->>MarkFailed: boolean result
MarkFailed-->>Check: True/False
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/sentry/monitors/logic/incident_occurrence.py`:
- Around line 159-171: In get_monitor_environment_context(MonitorEnvironment),
the function copies monitor.config to local variable config and mutates
schedule_type but then returns monitor_environment.monitor.config (the
original); fix by returning the modified local config (or assign the modified
copy back) in the returned dict under the "config" key so the schedule_type
display change is preserved.
- Around line 130-156: Fix three issues in get_failure_reason: correct the
docstring typo to "human readable"; change the membership check when building
status_counts to "if checkin['status'] in HUMAN_FAILURE_STATUS_MAP" (remove
.keys()); and replace list(status_counts.keys())[0] with
next(iter(status_counts.keys())) (or next(iter(status_counts))) when selecting
the single status for SINGULAR_HUMAN_FAILURE_MAP to avoid creating a full list.
🧹 Nitpick comments (1)
src/sentry/monitors/logic/incidents.py (1)
36-53: Unnecessary list materialization insideany().Line 52 uses
any([...])with a list comprehension. A generator expression (any(... for ...)) short-circuits without allocating the full list.Proposed fix
- if any([checkin["status"] == CheckInStatus.OK for checkin in previous_checkins]): + if any(checkin["status"] == CheckInStatus.OK for checkin in previous_checkins):
| def get_failure_reason(failed_checkins: Sequence[SimpleCheckIn]): | ||
| """ | ||
| Builds a humam readible string from a list of failed check-ins. | ||
|
|
||
| "3 missed check-ins detected" | ||
| "2 missed check-ins, 1 timeout check-in and 1 error check-in were detected" | ||
| "A failed check-in was detected" | ||
| """ | ||
|
|
||
| status_counts = Counter( | ||
| checkin["status"] | ||
| for checkin in failed_checkins | ||
| if checkin["status"] in HUMAN_FAILURE_STATUS_MAP.keys() | ||
| ) | ||
|
|
||
| if sum(status_counts.values()) == 1: | ||
| return SINGULAR_HUMAN_FAILURE_MAP[list(status_counts.keys())[0]] | ||
|
|
||
| human_status = get_text_list( | ||
| [ | ||
| "%(count)d %(status)s" % {"count": count, "status": HUMAN_FAILURE_STATUS_MAP[status]} | ||
| for status, count in status_counts.items() | ||
| ], | ||
| last_word=_("and"), | ||
| ) | ||
|
|
||
| return _("%(problem_checkins)s check-ins detected") % {"problem_checkins": human_status} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor issues: typo in docstring and static analysis findings.
- Line 132: Typo — "humam readible" → "human readable".
- Line 142: Idiomatic Python —
in HUMAN_FAILURE_STATUS_MAPsuffices (.keys()is implicit). - Line 146: Per Ruff RUF015, prefer
next(iter(status_counts.keys()))overlist(status_counts.keys())[0]to avoid materializing the full list for a single element.
Proposed fixes
- """
- Builds a humam readible string from a list of failed check-ins.
+ """
+ Builds a human readable string from a list of failed check-ins.- if checkin["status"] in HUMAN_FAILURE_STATUS_MAP.keys()
+ if checkin["status"] in HUMAN_FAILURE_STATUS_MAP- return SINGULAR_HUMAN_FAILURE_MAP[list(status_counts.keys())[0]]
+ return SINGULAR_HUMAN_FAILURE_MAP[next(iter(status_counts))]🧰 Tools
🪛 Ruff (0.14.14)
[warning] 146-146: Prefer next(iter(status_counts.keys())) over single element slice
Replace with next(iter(status_counts.keys()))
(RUF015)
🤖 Prompt for AI Agents
In `@src/sentry/monitors/logic/incident_occurrence.py` around lines 130 - 156, Fix
three issues in get_failure_reason: correct the docstring typo to "human
readable"; change the membership check when building status_counts to "if
checkin['status'] in HUMAN_FAILURE_STATUS_MAP" (remove .keys()); and replace
list(status_counts.keys())[0] with next(iter(status_counts.keys())) (or
next(iter(status_counts))) when selecting the single status for
SINGULAR_HUMAN_FAILURE_MAP to avoid creating a full list.
| def get_monitor_environment_context(monitor_environment: MonitorEnvironment): | ||
| config = monitor_environment.monitor.config.copy() | ||
| if "schedule_type" in config: | ||
| config["schedule_type"] = monitor_environment.monitor.get_schedule_type_display() | ||
|
|
||
| return { | ||
| "id": str(monitor_environment.monitor.guid), | ||
| "slug": str(monitor_environment.monitor.slug), | ||
| "name": monitor_environment.monitor.name, | ||
| "config": monitor_environment.monitor.config, | ||
| "status": monitor_environment.get_status_display(), | ||
| "type": monitor_environment.monitor.get_type_display(), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: modified config copy is never used — original config is returned instead.
Line 160 creates a local copy of config and line 162 mutates schedule_type on that copy, but line 168 returns monitor_environment.monitor.config (the original, unmodified dict). The schedule_type display transformation is silently discarded.
🐛 Proposed fix
return {
"id": str(monitor_environment.monitor.guid),
"slug": str(monitor_environment.monitor.slug),
"name": monitor_environment.monitor.name,
- "config": monitor_environment.monitor.config,
+ "config": config,
"status": monitor_environment.get_status_display(),
"type": monitor_environment.monitor.get_type_display(),
}🤖 Prompt for AI Agents
In `@src/sentry/monitors/logic/incident_occurrence.py` around lines 159 - 171, In
get_monitor_environment_context(MonitorEnvironment), the function copies
monitor.config to local variable config and mutates schedule_type but then
returns monitor_environment.monitor.config (the original); fix by returning the
modified local config (or assign the modified copy back) in the returned dict
under the "config" key so the schedule_type display change is preserved.
This pull request was automatically created by
@coderabbitai/e2e-reviewer.Batch created pull request.
Summary by CodeRabbit
New Features
Refactor